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1.  Introduction.  In  recent  years,  there  has  been  considerable  research  on  fast  al¬ 
gorithms  for  the  solution  of  linear  systems  of  equations  with  Toeplitz  matrices.  The 
Levinson  and  Schur  algorithms  allow  solutions  with  O(n^)  floating  point  operations 
(flops)  for  systems  with  «  X  «  Toeplitz  matrices. 

In  1980  Brent,  Gustavson,  and  Yun  [5]  described  a  scheme  for  obtaining  a  solution 
with  0(n  log^  «)  flops.  This  was  based  on  two  ideas — the  use  of  the  Gohberg-Semenciil 
formula  [1 1],  [13],  [17],  [26]  for  the  inverse  of  a  Toeplitz  matrix,  and  the  use  of  divide- 
and-conquer  (or  doubling)  techniques  for  computing  (generators  of)  the  Gohbei^- 
Semencul  formula. 

Let  X  and  y  denote  the  first  and  last  columns  of  T~'  e  R"^".  Then  if  the 
first  component  of  x,  say  Xi,  is  nonzero,  Gohberg  and  Semencul  [13]  showed  that  we 
could  write 


r-'=-[L(x)L''(/„y)-L(Z„y)L^(Z„4x)],  x,#0, 

where  /„  is  the  reverse-identity  matrix,  Z„  is  the  shift  matrix. 


0 

1  0 

*  2||  “ 

1  oj 

and  L(v)  is  a  lower-triangular  Toeplitz  matrix  with  first  column  v.  The  significance  of 
the  Gohbeig-Semencul  formula  in  the  present  application  is  that  the  product  of  a  vector 
and  a  lower-  or  upper-triangular  Toeplitz  matrix  is  equivalent  to  the  convolution  of  two 
vectors,  which  can  be  done  using  0(n  log  n)  flops  (see,  e.g.,  [4]). 

Brent,  Gusta.son,  and  Yun  used  a  divide-and-conquer  scheme  for  a  certain  Euclidean 
algorithm  to  factorize  row-permuted  Toeplitz  matrices  (i.e.,  Hankel  matrices),  and  to 
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obtain  the  vectors  {x,  y}  of  the  Gohberg-Semencul  formula  with  0(.n  log^  n)  flops. 
Later  Bitmead  and  Anderson  [3]  and  Morf  [21]  used  another  approach  based  on  the 
displacement-rank  properties  of  matrix  Schur  complements,  to  obtain  similar  results; 
while  this  approach  allows  for  generalization  to  non-Toeplitz  matrices,  the  hidden  coef- 
flcient  in  their  proposed  0(/i  log^  «)  constructions  turned  out  to  be  extremely  lai^e  (see 
Sexton,  Shensa,  and  Speiser  [  25  ] ) .  Later  Musicus  [  22  ] ,  de  Hoog  [11],  Ammar  and  Gragg 
[2]  used  a  more  direct  approach  based  on  a  combination  of  the  Schur  and  Levinson 
algorithms  to  obtain  better  coefficients;  in  particular,  Ammar  and  Gragg  made  a  detailed 
study  and  claimed  an  operation  count  of  8n  log^  n  flops.  With  this  count,  the  new  (called 
superfast  in  [2])  method  for  solving  (exactly  determined)  Toeplitz  systems  is  faster  than 
the  one  based  on  the  Levinson  algorithm  whenever  n  >  256.  We  should  mention  here 
that  Schur-algorithm-based  methods  are  natural  in  the  context  of  transmission-line  and 
layered-earth  models,  so  it  is  not  a  surprise  that  similar  techniques  were  also  conceived 
in  those  fields  (see  Choate  [7],  McClary  [20],  Bruckstein  and  Kailath  [6]).  A  good 
source  for  background  on  the  Levinson  and  Schur  algorithms,  transmission  line  models, 
displacement  representations  as  mentioned  and  used  in  the  present  paper  may  be  [14]. 

The  method  we  have  taken  in  this  paper  is  in  the  spirit  of  the  generalized  Schur 
algorithm  (see,  e.g.,  [8],  [9]).  Our  algorithm  can  be  applied  to  non-Toeplitz  matrices, 
and  is  simpler  than  the  methods  of  Bitmead  and  Anderson  [  3  ]  or  Morf  [  2 1  ] .  Furthermore, 
we  can  readily  handle  matrices  such  as  (T^T)~'  and  where  T  may  be  a 

near-Toeplitz  matrix  or  a  rectangular  block-Toeplitz  matrix,  or  a  Toeplitz-block  matrix; 
in  particular,  therefore,  we  can  also  obtain  the  least-squares  solutions  of  overdetermined 
Toeplitz  and  near-Toeplitz  systems  with  0(n  log^  n)  flops.  Our  algorithm  is  closely  related 
to  the  algorithm  of  Musicus  [  22  ] .  However,  our  presentation  is  conceptually  much  simpler 
(especially  for  the  non-Toeplitz  cases  treated  in  [22])  than  previous  approaches;  in  par¬ 
ticular,  we  do  not  use  the  relationship  between  the  Schur  algorithm  and  Levinson  al¬ 
gorithms  needed  in  [2],  [1 1],  and  [22]. 

An  outline  of  our  approach  is  the  following.  For  a  matrix  E, 


(1) 


£'i.i ,  nonsingular. 


the  Schur  complement  of  E\^\  in  E  is 

E2,2~  E2,\Ex\E\2. 

Note  that  matrices  such  as 

(2)  Sx^T-\  S2^(T^Tr\  S^^(T^Tr'T^ 

can  be  identified  as  the  Schur  complements  of  the  northwest  blocks  in  the  following 
extended  matrices: 


(3) 


E,= 


T  I 
-I  O 


TT- 

O 


Now  the  matrices  £  in  ( 3 )  have  the  following  ( generalized )  displacement  representation , 
for  suitably  chosen  matrixes  { F* } : 


E=  i  K{Xi,Ff)K^(yi,F»), 

/-I 

where  A’(x/,  F^)  and  Affy,,  F*)  are  lower  triangular  matrices  whose  j  columns  are 
(F^yj-'^xi  and  (F*)‘-'“  ”yi.  respectively.  The  smallest  possible  number  a  is  called  the 
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displacement  rank  of  E  with  respect  to  f*}.  For  an  example,  let  Fbe  an  w  X  « 
scalar  Toeplitz  matrix,  with  m^n.  Then  the  matrix  E2  has  displacement  rank  four  with 
respect  to  { F,  F} ,  where  F  =  [q"  ,  and  has  a  displacement  representation  [15], 


2  4 

{4a)  £2=2  K(yi,F)K^{Xi,F)'  2  /i:(y.,F)£^(x„F), 

1=1  i=3 


X,. 


If  we  define  x[  ^  [wf,  v,^],  note  that  the  matrix  ATCx,,  F)  in  (4a)  has  the  form 


(4b) 


■L(w,)  O' 
.£(v,)  O. 


OeR” 


where  L(Wi)  and  L(v,)  are  lower  triangular  Toeplitz  matrices  with  first  columns  w, 
and  Vi. 

Given  a  displacement  representation  of  E,  we  use  a  certain  generalized  Schur  al¬ 
gorithm  (see  §  2)  to  successively  compute  displacement  representations  of  the  Schur 
complements  of  all  the  leading  principal  submatrices  in  E.  For  the  above  example,  n 
steps  of  the  generalized  Schur  algorithm  will  yield 

2  4 

=  2  £(u„F)£’"(u,.F)-  2  £(Ui,F)£^(Ui,F), 

» =  J  1  =  3 


o  o 

p  (T^T)-' 


where  the  top  n  elements  of  Ui  are  zero.  Therefore,  if  we  denote  the  bottom  n  elements 
of  u,  as  U2,,,  we  can  have  the  displacement  representation 

2  4 

(T^T)-'  =  2  L(U2j)L^(U2.,)-  2  L(U2,)L^(U2j). 

<  = I  1  =  3 


Now,  the  generalized  Schur  algorithm,  which  is  a  two-term  polynomial  recursion, 
can  be  implemented  in  a  divide-and-conquer  fashion  with  0(a^f(n)  log  n)  flops,  where 
/(«)  denotes  the  number  of  operations  for  the  multiplication  of  two  polynomials.  There¬ 
fore,  if  the  multiplication  of  two  polynomials  is  done  again  by  divide-and-conquer,  i.e., 
by  using  fast  convolution  algorithms,  then  the  overall  computation  requires  0(a^n  log^  n) 
flops.  Once  we  have  a  displacement  representation  of  the  desired  Schur  complement  S, 
the  matrix- vector  multiplication,  5b,  can  be  done  with  0(  an  log  n)  flops  using  fast  con¬ 
volutions.  As  an  example,  we  can  obtain  the  least  squares  solution  for  the  Toeplitz  system 

rx  =  b,  TeR”'^",  m^n 


as  follows; 

(i)  Form  T^h  using  two  fast  convolutions, 

( ii )  Obtain  a  displacement  representation  of  (  t^T)~'  using  the  divide-and-conquer 
version  of  the  generalized  Schur  algorithm, 

(iii)  Form  ( T^T)~*(T^h)  using  eight  fast  convolutions. 

If  we  had  obtained  the  displacement  representation  of  (r’^F)“'r^  directly  (using  £3), 
then  step  (i)  above  would  not  be  needed. 

2.  Generalized  Schur  algorithm.  After  a  brief  review  of  basic  concepts  and  defini¬ 
tions,  we  shall  describe  the  generalized  Schur  algorithm  of  references  [8],  [9],  and  [15], 
but  in  a  polynomial  form  important  for  the  divide-and-conquer  implementations.  We 
shall  need  to  recall  some  definitions  and  basic  properties. 
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Generators  of  matrices.  Let  F^and  F*  be  nilpotent  matrices.  The  matrix 

V^F^,FX)^=A~FUF‘>'^ 

is  called  the  displacement  of  A  with  respect  to  the  displacement  operators  { F^ ,  F* } . 
Define  the  (F-^,  F*) -displacement  rank  of  ^  as  rank  Any  matrix  pair 

{X,Y}  such  that 

(5)  v^Ff,F>’)A=XY'^,  •  •  •  ,x„],  T.=  [yi,y2,  •  ■  •  .yj 


is  called  a  (vector  form)  generator  of  A  with  respect  to  {F^,  F*} .  The  generator  will  be 
said  to  have  length  a.  If  the  length  a  is  equal  to  the  displacement  rank  of  ^ ,  we  say  that 
the  generator  is  minimal.  A  generator  such  as  F  =  A"2,  where  S  is  a  diagonal  matrix 
with  1  or  -1  along  the  diagonal,  is  called  a  symmetric  generator. 

The  following  lemma  [  1 5  ] ,  [  1 6  ]  establishes  the  connection  between  generators  and 
displacement  representations. 


Lemma.  Let  E  be  an  mX  n  matrix.  If  F^ and  F*  are  nilpotent,  then  the  equation 
V(//f»)F  =  2i  x,y,^  has  the  unique  solution  F  =  2"  F(x,,  F^)F^(y,,  F*),  where 


K(Xi,  FO  -  [x„  F^x„  • .  ■  ,  F^<"-  '>x,]  andKiyj,  F*)  -  [y„  F*y„ 


Choice  of  displacement  operators.  The  generalized  Schur  algorithm  operates  with 
generators,  and  needs  O(amn)  flops  for  sequential  implementation  and  0{a^n  log^  n) 
for  divide-and-conquer  implementation.  Therefore,  for  a  given  matrix  /I,  we  should  try 
to  choose  the  displacement  operators  that  give  the  smallest  a.  If  the  matrix  /i  is  an  n  X 
n  Toeplitz  matrix,  the  appropriate  displacement  operator  F  is  Z„,  an  n  X  n  shift  matrix. 
If  A  has  some  near-Toeplitz  structure,  then  F  would  have  forms  such  as 


F^Z„®Z„, 


F=  ©  Z„„ 

I 


F=ZS, 


where  @  denotes  the  direct  sum,  Z„  ©  Z„,  =  [§■  f^],  and  ©f=  i  denotes  the  concatenated 
direct  sum. 

Example  1.  Let  T  =  (/,_y)  be  an  m  X  n  pre-  and  post-windowed  scalar  Toeplitz 
matrix,  i.e.,  Uj  =  0  if  j>i  or  i>m-n+j  with  m'^  n.  Then  it  is  easy  to  check  that 
the  matrix  C  =  (c,-j)  T^T is  also  an  (unwindowed)  Toeplitz  matrix,  and  with  respect 
to  { Z„  ®  Z„ ,  Z„  ©  Z„ } ,  £3  in  ( 3 )  has  a  generator  { A",  F }  of  length  two,  where 

Xi  =  [co,Ci,  •••  ,c„-i,-l,0,  •••  ,0]^/ci'^ 

X2  =  l0,c,,  •••  ,c„-,,-l,0,  •••  ,0lVcy^ 

yi  =  [A),C|,  •  •  •  1,/0,/t,  •  •  •  •  •  •  ,0]Vci'^ 

y2  =  -[0,Ci,  •••  ,c„-,,to,t,,  •••  t„-„,0,  •••  ,0]Vco'^  □ 

Example  2.  If  F  is  a  Toeplitz-block  matrix,  i.e., 

■  F,.,  F,,2  •  F,.;v  ■ 

(6)  F=  Fi.;  =  scalar  m/Xn,  Toeplitz  matrix, 

Tm,\  Tm,2  • 
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then  for  the  matrices  fin  (3),  we  choose  [9],  [15]  the  following  displacement  operators: 
(7a)  f,:f^=f©zJ©F,,  F'’  =  [©zJ©F,,  m  =  n. 


(7a) 

Et:F^= 

•  M 

©Zm, 

1=1 

©F., 

fi>  = 

r  N 

©z„, 

i  =  1 

(7b) 

E2:F^= 

r  ^ 

©z„, 
#=  1 

©F,, 

F*  = 

■  N 

©z„, 

(7c) 

E2.F^= 

•  N 

©z„. 

1=1 

©F,, 

F»  = 

-  N 

®z„, 

1=1 

where  Fi  can  be  either  Z„  or  ©{1 1  Z„-.  However,  for  the  divide-and-conquer  implemen¬ 
tation,  we  prefer  to  choose  ©Jt  i  Z„,;  see  the  remark  in  §  4. 

Example  3.  On  the  other  hand,  if  the  matrix  F  in  (3)  is  a  block-Toeplitz  matrix 
with  X  /3  blocks, 


(8)  T= 


Bo  B-x  ■  B-S+  I 

B\  Bq  •  B-/V+2 


m^Mfi,  n^NP, 


|_Fa/-1  Bm-2‘  B-n+M  \ 

then  for  the  extended  matrices  F,  we  should  choose  [8],  [9]  the  displacement  operators 

(9)  F■^=Z„^©Z„^  F»  =  Z^„®ZL 

where  for  E\  we  assumed  that  F  is  a  square  n  X  /i  matrix. 

Generators  of  the  above  and  other  extended  block-Toeplitz  or  Toeplitz-block  matrices 
can  be  found  in  [8]. 

Polynomial  form  of  generators.  In  general,  the  displacement  operators  F^  and 
for  both  extended  block-Toeplitz  matrices  and  extended  Toeplitz-block  matrices  have 
the  form 

N  N 

(10)  F=®ZS„ 

,=1 

We  shall  say  that  the  displacement  operator  F  in  ( 10)  has  sections.  One  of  the  key 
operations  in  generalized  Schur  algorithms  is  matrix-vector  multiplication  Fv,  i.e.,  a 
sectioned  shift  operation.  With  the  polynomial  representation  of  vectors,  the  shift  oper¬ 
ations  has  a  nice  algebraic  expression.  For  a  given  vector  v,  let  u(  z)  denote  the  polynomial 
whose  coefficient  for  the  term  z'  is  the  (  /  +  1  )st  component  of  the  vector,  i.e., 

(11)  V  =  [Vo,V|,U2,  •••  ,V„-x]^-^V{z)  =  Vo  +  V,Z  +  V2Z^+  z"  . 

Then, 

Z„v»v'  =  [0,»o,Vi,  •••  ,u„_2]^^v(2)zmodz''. 

In  general,  for  the  matrix  whose  displacement  operator  is  the  F  in  ( 1 0 ) ,  let  us  define 
integers  { 5,  }  by 

i 

6/=  Z  6i  < §2 <  •  •  •  < bfi/. 

k-l 


Let  v(z)  and  0(z)  be  polynomials  of  degree  less  than  or  equal  to  n  -  1,  and  define  the 
degree  at  most  (n,  -  1 )  polynomial,  »,(z),  by 

(12a)  v{z)  =  Vi(z)  +  z*'V2(z)  +  z*^V3(z)+  ••  +  z‘'*-'v„^Xz). 
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Given  two  polynomials  v(z)  and  0(z),  and  the  displacement  operator  F  in  ( 10),  the 
(polynomial  form)  displacement  operator  ®f  K  defined  by  the  following  operation: 

(12b)  v(z)®F0(z)^r(z)^r,(z)  +  z^'r2(z)  +  z^^r^(z)+  ■  ■  ■  +z*^-'r„^(z), 

where 

(12c)  ri(z)^Vj(z)0(z^)mod  z"‘, 

i.e.,  r/(z)  is  the  polynomial  Vi(z)0(z^)  after  chopping  off  the  higher  degree  terms,  so  that 
r,(z)  has  the  degree  at  most  («/  -  1 ). 

Let 

A'=[Xi,X2,  •••  ,Xal,  L=[y,,y2,  •••  ,y„] 
be  a  generator  of  a  matrix  A  with  respect  to  certain  { F* } ,  and  let 

\i*--Xi(z),  y,'^>',(H’). 

Then  we  call  the  pair  of  polynomial  vectors  {X(z),  T(  w) } ,  where 

X(z)  =  [Xi(z),X2iz),  •••  ,Xa(z)],  T(w)  =  [;^,(w),>'2(h’),  •••  ,y„(H’)], 

a  (polynomial  form)  generator  of  A,  with  respect  to  (polynomial  form)  displacement 
operator  { ®Ff,  ®f‘'  }  • 

Example  1  (continued).  The  matrix  £3  in  ( 3)  has  a  generator  {X(z),  T(  w) }  with 
respect  to  {©f/,  ©f^},  where  F^=  Z„®  Z„,  F'’  =  Z„®  Z^,  and 

Xi(z)  =  [co  +  CiZ+--‘+c„-,z''~'-z”]cS''^, 

X2(z)  =  [CiZ  +  CZ^+  •  •  ■  +C„.iZ''~' -z’’]cS''^, 

>'|(W’)  =  [^'0  +  C|M'+  •••  +  '  +  •  •  • 

>'2(H')  =  -[C|H'+  •  •  •  +  +  -  +t„-„w'”]cS''^. 

Also  note  that 

Xi(z)®F/Z  =  [CqZ  +  C,Z^  +  ■■■+C„^2Z"~  '  ']Co''^ 

>’|()V)©/r*W' 

=  [CoW  +  CiW^+ - |-C„-2W""  '  +/oW"^  '  - ']Co''^.  □ 

Next  we  note  that  for  given  vectors  a  and  b  such  that  a  #  0,  we  can  always  find 
[8]  matrices  6  and  'if  such  that 

(13)  a^e  =  [a',,0,0,  •••  ,0],  b^'I' =  [6',,0,0,  •  •  •  ,0],  e 

and  therefore,  a^b  =  a'l  b\ .  We  define  polynomial  matrices  0(z)  and  'i^(  w)  by 


(14) 

e(z)-e 

z 

1 

’i'(H’)»^ 

W 

1 

. 

1 , 

1  . 

We  also  remark  that  if  a  =  b,  then  we  could  choose  if(w)  =  ©(w),  and  if  b  =  Sa, 
where  2  «■  Ip®  -  /,,  then  ♦(w)  =  6(^)2,  so  that  we  only  need  to  find,  and  post- 
multiply  by,  0(z). 
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Generalized  Schur  algorithm.  Let  a  matrix  £  have  a  generator  {A'o(r),  LbC  w) }  with 
respect  to  { ®f/,  ©f*  } ,  and  define  by 


£  = 


■£■1.1 


Eu2 

Ei.2\ 


€R” 


where  £|  j  is  a  /:  X  A:  strongly  nonsingular  matrix,  i.e.,  the  one  with  all  nonsingular  leading 
submatrices.  The  fc-step  generalized  Schur  algorithm  18],  [9],  [15]  presented  below  in 
polynomial  form  gives  a  generator  of  the  matrix 

^  ^  ,  S^£2.2-£2.,£7.',£,,2eR‘'"“'^'^'""*', 

with  respect  to  {©/:/,  ©/r*},  or  equivalently,  a  generator  of  S  with  respect  to 
® /■!>},  where  £^and  F''  denote  the  trailing  square  submatrices  of  size  {m  -  k) 
and  («  -  k)  of  £^and  £*,  respectively. 

Algorithm  (k-step  generalized  Schur  algorithm). 

Input:  Generator  of  £,  { A'o(z),  yb(u') } ;  displacement  operator  { ®/./.  ®f*} ; 

Number  of  steps  k. 

Output:  Generator  of  S  { A'^Cz),  y*(H’)} 

Procedure  GeneralizedSchur 
begin 

for  i  :=  Qio  k—  I  do  begin 

a^;=  [z-‘Xi(z)],^o, 
b^:=[z-T,(z)],=o; 

Find  9j(z)  and  ^/(w)  to  transform  a^and  b^such  as  (13); 

+  ,  (Z)  =  T,(Z)©/r^e,(Z);  r,  +  ,  (  H  )  =  y,(  H’)©y:,<',(  M  ) 

end 

return  {A'*(z),  y*(w)} 

end 


Remark.  The  polynomial  vectors,  X/iz)  and  y,(w),  have  degrees  m  -  I  and 
«  -  1,  respectively,  for  all  /.  Each  step  eliminates  the  nonzero  lowest  degree  term,  and 
therefore  the  terms  of  Xi(z)  and  y,(  w)  whose  degrees  are  less  than  z'  and  m  '  are  zeros. 

By  applying  the  generalized  Schur  algorithm  we  can  obtain  generators,  or  equivalently 
displacement  representations,  for  various  interesting  Schur  complements. 

3.  Divide-and-conquer  implementation.  The  (sequential)  A:-step  generalized  Schur 
algorithm  in  §  2  can  also  be  implemented  efficiently  using  the  divide-and-conquer  ap¬ 
proach.  We  shall  only  explain  how  to  find  Xk{z):  essentially  the  same  argument  applies 
for  y*(w). 

Let  us  define  ^p^,^{z)  and  Xp:^(z)  by 

®p:,(z)*0p(z)ep+,(2)  •••  e,(z), 

Xp:if(z)mXo:^(z)®F/eo:p  ;(z),  A(o:,(z)  “  A'o(z)  mod  Z’^  ' , 

where  0  ^p^g.  The  polynomial  matrix  6^,(  z)  has  a  degree  q-  p  +  1 .  The  polynomial 
vector  Xp.j,(z)  has  degree  q,  and  is  obtained  by  dropping  from  Xp{z)  all  terms  of  degree 
higher  than  z".  Also  note  the  useful  properties, 

lx(z)®Fe,(z)]®F$2iz)  =  x(z)®Fie,{z)e2(z)], 

lX,iz)  +  X2(z)]®Fe{z)=[X,(z)®F0(z)]  +  [X2(z)®Fe(z)]. 
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These  properties  and  the  fact  that  0p:,(2)  is  completely  determined  by  Xp.„{z)  allow  a 
divide-and-conquer  implementation  of  the  generalized  Schur  algorithm. 

Given  A'p;,(z),  we  can  compute  0p:„(z)  as  follows.  lfp  =  g,  then  we  are  successful, 
and  compute  %.p{z)  =  %(z).  Otherwise,  we  choose  an  “appropriate”  (see  §  4)  division 
point  r  such  that  p<r<q,  and  try  to  solve  the  smaller  subproblem  of  finding  0^:,  _  i  ( z ) , 
given  Xp:r-{(z).  Once  we  know  0p,_  i(z),  we  can  compute  Xr.^(z)  by 

(15a)  Xr:^(z)=Xo:g(z)®F/eo:r-l(z)  =  lXo:g(z)®f/eo:p-l(z)]®f/Bp:r-i(z) 

(15b)  =Xp.Jz)®f,ep..r-i(z). 

Now  we  again  try  to  find  Br.q(z)  given  Xr.g(z).  After  we  obtain  0r:,(z),  we  can  combine 
the  two  results,  0p;,_  i(z)  and  0r:,(z),  by  multiplication, 

(16)  0p:«(2)  =  0p:r-l(z)0r:,(z). 

Programming  details  of  the  above  recursive  generalized  Schur  algorithm  are  shown  in 
the  Appendix. 

The  previous  recursive  description  can  be  visualized  nonrecursively  using  trees 
(See  Figs.  1  and  2).  Each  node  in  the  tree  is  annotated  with  the  rules:  “find,”  “apply,” 
and  “combine,” 


fp’.p'.  Find0p;p(z), 

ttp:q-  r:q(z):=  Xp-_g{z)®FBp-,r  - \(z), 

t^ir.q'  ^p:qiz):~  Bp:r  - \{z)&r-q{z)- 

We  traverse  the  tree  in  post-order  ( i.e.,  follow  the  order  labeled  on  each  node  of  the  tree ) , 
and  evaluate  the  rules. 


Fig  .  1 .  Sequence  of  compulations  for  Example  4. 
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Fig.  2.  Sequence  of  computations  for  Example  5. 


Now,  we  shall  consider  two  examples  in  detail. 

Example  4,  Pseudoinverse  of  pre-  and  post-windowed  Toeplitz  matrices.  Consider 
the  matrix  £3  in  Example  1 ,  where 


'16 

8 

4 

1  ' 

'3 

2 

1 

1 

-1 

0 

0 

0  ' 

T^T= 

8 

16 

8 

4 

0 

3 

2 

1 

1 

-1 

0 

0 

4 

8 

16 

8 

f  * 

0 

0 

3 

2 

1 

1 

-1 

0 

1 

4 

8 

16 

0 

0 

0 

3 

2 

1 

1 

-1 . 

We  would  like  to  find  a  displacement  representation  of  {T^TY'T^.  This  can  be  done 
by  the  four-step  recursive  generalized  Schur  algorithm.  The  input  to  the  algorithm  is  a 
generator  {Xo{z),  Fo(vt')}  of 


£3  = 


T^T 

-I 


TT- 

O 


with  respect  to  {©//,©/r*},  where  F^=Z„@Z„,  F'’ =  Z„®  Z„.  The  output 
1X4(2),  )4(w)}  is  a  generator  of  (T^T)~'T^  with  respect  to  {®z,,  ®z„}.  The  com¬ 
putational  sequence  is  illustrated  in  Fig.  1,  where  it  is  assumed  that  the  division  points 
were  chosen  successively  by  two,  one,  and  three. 


tn 

[2] 

13] 

[4] 


Ao:®0:o(^)  ~ 


1  0 

z 

0  -1 

1, 

because  A'o:o(z)  =  [4,0]. 


Oo:!  :^l:l  (^)  =  Ao,l  (z)©/r/eo:o(z)  =  [4-1- 2r,  2z]©yr/eo:o(^)  =  [4z, -2z]. 


V3 


CO:i:0O:l(z)  =  eo«(z)e|:|(z) 


’ 

V3'.- 


z" 

z/2 


-z/21 
I 
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Remark  1 .  For  a  symmetric  generator  of  length  two  with  =  1 ,  the  2  X  2  polynomial 
matrix  ©(z)  in  ( 14)  can  have  the  form  (hyperbolic  reflection) 


0,(z) 


chjZ  shi 
-shjZ  —cftil' 


chj  —  shj=  1. 


Let 


Qp„(z)  =  Qf,(z)&p+  ,(z)  •  •  •  0,(z)  = 


■0i..(2) 

.02.1  (Z) 


0|.2(^)' 

02.2(2). 


Then,  by  induction,  we  can  easily  prove  that 

Z«-^^'0,,,(Z-')  =  (-l)«-^^'e2,2(z),  Z«-'’^‘0,.2(Z-')  =  (-l)’^'’^'02,,(z). 


Therefore,  we  need  to  compute  and  store  only  two  entries  of  0p:,(z). 

Remark  2.  For  an  unwindowed  scalar  Toeplitz  matrix,  the  matrix  E2  in  (3)  has 
displacement  rank  four,  whereas  the  matrix  £3  has  displacement  rank  five.  Therefore, 
when  we  solve  Toeplitz  least  squares  problems,  it  is  more  efficient  to  find  a  displacement 
representation  of  (7’^7’)"‘  rather  than  of  {T^T)~'T^.  With  the  notation  in  (4),  the 
matrix  £2  for  an  unwindowed  scalar  Toeplitz  matrix  T  =  (?,-y)  €  (m  ^  n)  has  a 
generator  [15], 

wi  =  r^t,/||til|,  W2  =  t2,  yi2  =  Z„Zlyix,  W4  =  Z„1, 

,  t2  =  [0,  /-I,  •■•,/]-«]  ,  • 

Vi  =  V3  =  ei/||till,  V2  =  V4  =  0, 


where  H  •  H  denotes  the  Euclidean  norm,  and  ci  is  the  vector  with  one  in  the  first  position, 
and  zeros  elsewhere. 

Example  5.  Displacement  representation  for  the  inverse  of  a  Sylvester  matrix.  Let 
T  denote  the  following  Sylvester  matrix. 


(17) 


2  0  0  10 
12  0  2  1 
3  12  12 

0  3  111 

0  0  3  0  1 


and  suppose  that  it  is  desired  to  obtain  a  displacement  representation  of  £  '.  Then  the 
appropriate  extended  matrix  is 


(18) 


I 

O 


and  it  is  easy  to  see  that  the  following  {Xo(z),  Jo(h’)}  is  a  generator  of  £|  with  respect 
to  { ®F/,  ©F* } ,  where  F^=  Z^®  Z^,  F*’  =  Z3  ©  Z2  ©  Zj; 

,To(z)«[X|(z),X2(z),X3(z)],  To(M')«®[>'l(M'),y2(M'),F3(W’)], 

(19a)  JC|(z)  =  2  +  z  +  3z^-z’,  3:2(2)=  1 +2z  +  z^  +  z^-z*, 

(19b)  >'i(w)=l,  y2(»v)  =  H'^  y3(w’)=w^ 


3:3(2)=  1, 
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Now  the  five-step  recursive  generalized  Schur  algorithm  gives  a  desired  generator  of  r~‘, 
with  respect  to  { Z5 ,  Z5 } ,  and  a  possible  computational  sequence  is  shown  in  Fig.  2, 
where  the  division  points  are  chosen  successively  as  two,  one,  three,  and  four. 


z 

4  -n 

w 

0 

O' 

[1] 

^0:0-00:0(2)=  0 

1 

0  . 

^0:0(W-)=  W/2 

1 

0 

.0 

0 

1 . 

.w/2 

0 

1 . 

[2]  ao:,;^,:.(r)  =  [2z,3z/2,-z/21,  r,:,(w)  =  [w,0,0]. 


’  2  -i  1  r  w  00' 

[3]  /i:i:Qi:i(2)=  0  10,  ^'i:|(W')=  3^/4  1  0  . 

.0  0  ij  L-W4  0  1. 

’z^  -3Z/4-1/2  z/4-1/2' 

[4]  Co:|  .‘©omIz)  =  0  1  0  , 

.0  0  1  . 

0  o' 

^0:i(w')=  h’V2  +  3w/4  1  0  . 

.  wV2-u'/4  0  1. 

[5]  Cki-.A:X2.^(z)=  [2z2  +  zH3z\-5zV4-5zV4,-5zV4  +  3zV4], 

y2;4(W')=  yO:4(M')©fi>^0:l('^) 

=  [( l,0,0)^o:i(w')  mod  w^]  +  H'^[(0, mod  w^] 
=  [H'^-t-3H'‘*/4,  w^O]. 

w  0  0' 

^2:2(W')=  -Sw/S  1  0  . 

,-5w/8  0  1. 

[7]  a2:4:X3:4(z)=  [2z^  +  z^ -5z  VS  +  1  Sz^/S,  1  Iz  VS  +  1 5zV8  ] , 

y3:Aw)=  y2:4(w)®fi'^2:2(w) 

=  [(W'^0,0)^'2:2(H’)  mod  H'^J-|-V1'^[(3h'/4,  1,0)’^2:2(‘^’)  mod 
=  [-5wV8,H'^0]. 

'0  1  0]  r-16w/5  1  O' 

[8]  A3:03:3(z)=  ^  T  T  *  ♦3:3(»V)=  W  0  0. 

.0  0  1 J  L-llw/5  0  1. 

[9]  a3:4:A^4:4(z)  =  I-5zV8,7z^6z"],  F4:4(w)=[w«,-5h>V8,0]. 

■  z/(2V2)  28/(5V2)  f  ■ 

[10]  C4:4: 04:4(2)=  -5z/(16V2)  l/(2]f2)  -I  , 

0  0  1. 

■  w(2V2)  5/(16V2)  O' 

♦4:4(m')=  -28wA5V2)  1/(2V2)  0  . 

.  -12V2W/5  0  1. 


i  i 

[6]  /2:2:e2:2(2)=  0  1  0 


0  0  1 


140 


J.  CHUN  AND  T.  KAILATH 


Operations  [  1 1  ]-[  13]  are  obvious.  After  evaluating,  £-3:4,  €2-4,  and  Cqa,  we  obtain 
0o:4(z)  and  ^o:4(  w),  and  finally, 

[14]  ao:9:-^0:9(^)=  [Xi(z),X2(z),X3(z)]©f/©0:4(z) 

=  z’[(-l,-Z^O)©^-/eo;4(z)l 

=  Z^[(-l,-Z^O)eo:4(z)modz*]  =  Z*[«,(z),M2(z),M3(^)l, 

where 

«,(z)  =  -z/(2V2)-zV(2V2)  +  zVV2  +  zVV2, 

«2(z)  =  4/(5V2)  +  4z/V5+  16zV(5]^)-28zV(5\^)-28zV(5V2), 

U3(z)  =  2/5  +  z/5  +  2zV5  +  z^/5  -  6zV5. 

l'o.9(w)=  [yi(w),y2(w),y3(w)l©3r6^0:4(w) 

=  w’[(0,0,  1)©f*^0:4(w)]  =  W*[U,(w),U2(H'),U3(w)], 

where 

u,(w)  =  -12V2W/5  +  12wV(5]^)+  12wV(5V2)-  12wV(5V2), 

U2(H')  =  -w/V2  +  wV(2V2)  +  M'V(2y2)-wV(2V2), 

U3(W)=  1. 

Therefore, 

r-'=L(u,)L''(v,)  +  L(u2)L^(v2)  +  L(U3)L^(v3), 


where  U/  and  v,  are  the  vectors  whose 7th  component  is  the  coefficient  of  z-*" '  and  w^~ ' 
of  U/(z)  and  u,(w),  respectively.  □ 

Remark  3.  If  we  had  chosen  the  displacement  operator  ~  ©  Zj  Q  Z2, 

Z3  ©  Z2  ©  Z5  for  the  matrix  T in  ( 17)  we  would  have  the  same  generator  ( 19)  for  , 
but  the  obtained  generator  of  T"'  would  be  the  one  with  respect  to  {Z3  ©  Z2,  Z5 }  rather 
than  with  respect  to  { Z5 ,  Z5 } .  The  displacement  ranks  of  T~ '  with  respect  to  both 
displacement  operators  are  two,  but  the  above  procedure  gives  nonminimal  generators 
of  length  three. 

Remark  4.  The  following  extended  matrix: 


(20) 


■  T 

-/ 


T  =  Sylvester  matrix 


also  has  a  displacement  rank  of  three.  We  could  as  well  obtain  the  solution  r"'b  directly 
by  applying  the  recursive  generalized  Schur  algorithm  to  (20);  the  last  column  of  X, 
where  { A',  y }  is  the  computed  generator  of  T^'b  with  respect  to  {Z„,  1 } ,  can  be  shown 
to  be  the  solution  T^'b. 


4.  Polynomial  products  with  fast  convolutions.  The  product  of  two  polynomials  of 
degree  dt  and  d2  can  be  performed  efficiently  using  d  ^  di  +  d2  +  I  point  fast  cyclic 
convolution  algorithms  [4].  A  d-point  fast  cyclic  convolution  needs  Oid  log  d)  flops. 
Among  others.  Fast  Fourier  Transforms  (FFTs)  can  be  used  for  convolutions,  and  Ammar 
and  Gragg  [2]  carefully  examined  the  use  of  FFTs  for  a  doubling  algorithm  for  square 
Toeplitz  systems  of  equations.  We  shall  only  consider  the  subtle  complications  that  arise 
in  the  recursive  generalized  Schur  algorithm  in  this  paper. 
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The  polynomial  matrix-matrix  product  of  ( 16)  needs  of  q  —  p  point  cyclic  con¬ 
volutions.  The  polynomial  vector-matrix  product  of  (15b)  has  of  scalar  polynomial 
products  of  the  form,  x(z)®f/0(z),  where  x(z)  is  a  polynomial  with  nonzero  terms  of 


-P  yP+l  ... 

^  ^  »  » 

z".  Let  us  assume  that 

0<5i  <  •  • 

•  <d,^p<di+i<  <6,^^<5,+  i  <  ■  •  •  <6^ 

Then 

(21a)  x'(z) 

=  x{z)®ff6(z) 

=  [z*'X/+,(z)  +  Z^'+'X/+2(z)+  •••  +Z**X.S+,(Z) 

(21b) 

+  ---+z'*'x,+  ,(z)]©f/e(z) 

(22a) 

=  [z*'X/+,(z)+  •  •  •  +Z^‘-‘Xsiz)]®FfOiz) 

(22b) 

+  z‘‘[Xs-n(z)0(z^)  mod  z"‘*'] 

(22c) 

+  z*"+  '[Xj  +  2(z)6(z^)  mod  z"**^] 

(22d) 

+  z^‘[x,+  i(z)6(z^)  mod  z"'*']. 

The  terms  in  ( 22a)  do  not  need  to  be  computed  because  these  terms  will  be  summed  to 
zeros  after  adding  all  the  partial  sums  in  the  vector-matrix  multiplication  of  ( 1 5b ) .  Recall 
that  Xi(z)  has  degree  and  d(z^)  has  degree  Therefore,  the  product  Xi{z)d(z^) 

from  (22b)  to  (22d)  can  be  performed  by 

2n,+  I  point  cyclic  convolutions  if  degree  [^(z^)]  ^degree  [x,(z)], 

1  point  cyclic  convolutions  if  degree  [0(z®)]  <  degree  [x,(z)]. 

Remark  5.  Note  that  two  dll  point  convolutions  take  cd  log  (<f/2)  flops  if  one  d 
point  convolution  takes  cd  log  d  flops.  Therefore,  the  polynomial  product  (21 )  is  more 
efficient  for  the  displacement  operator  f^with  more  sections,  because  such  displacement 
operators  break  a  long  convolution  into  many  smaller  convolutions.  Therefore,  for  a 
given  matrix  we  prefer  to  choose  a  displacement  operator  with  as  many  sections  as 
possible,  while  keeping  the  displacement  rank  minimal.  Also  we  remark  that  the  first 
and  last  terms  (22b)  and  (22d)  need  smaller  point  convolutions. 

If  the  dimensions  of  the  matrix  are  powers  of  two,  then  we  can  always  choose  the 
center  division  point  r  =  \{p  +  ^)/21.  This  balanced  division  (or  doubling)  gives  the 
least  number  of  computations,  in  general.  For  this  case,  p  -  q,  and  T(t\)  denote 

the  number  of  computations  for  one  recursion.  Then 

7’(.,)^27’(.;/2)+  (F(i,),  lV(v)^0{a^r,  log  »?), 

and  therefore,  we  can  show  [  1  ]  that  the  /c-step  recursion  takes 

T{k)^0(a^k  log^  k). 

However,  in  most  cases  the  doubling  is  not  possible,  and  for  such  circumstances, 
the  desirable  choice  of  r  is  such  that  r  -  p  and  ^  -  r  +  1  are  highly  composite  numbers 
(so  that  fast  convolution  algorithms  can  be  applied  efficiently),  as  well  as  r  is  close  to 
(9  ~  P)/2  (so  as  to  achieve  balancing). 
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Matrix-vector  products  using  displacement  representation.  The  final  step  of  finding 
solutions  for  linear  equations  is  the  matrix-vector  multiplication  S'b,  given  a  displacement 
representation  of 

(23) 

i  =  I 

where  the  length  a  is  a  multiple  of  the  block  size  a  =  ^5,  say,  and 
F^=  ©  Z^,,  F*=  ©  Z^,  m  =  ^nii,  «  =  Z  «/• 

i-i  /-I  /=,  /=i 

The  expression  in  (23)  can  be  rewritten  in  the  block  displacement  form 

(24)  S='ZWi,F^)Kl{Y„F‘’),  y,GR""^ 

/  =  1 

where 

(25a)  K»{Xi,F^)  =  [Xi,F^Xi,Ff^Xi,  ■  •  •  '1x,]gR'”^", 

(25b)  /:^(T„F*)=[r„F*y„F“y„ ...  ,F*«'’^^>-'>r,]6R'’^". 

Furthermore,  because  F^and  F*  have  A/ and  sections,  respectively,  (25a)  and  (25b) 
have  the  forms 


Kg(X^,F^)^ 

'  Kg(X,j,Zi,) 
Kg(X2j,Zi,) 

O' 

o 

,  Kg{Yi,F^)^ 

'  KglYu.Zlf 
Kg{Y2,,Z%) 

o' 

o 

_  Kg(XMj,Z^„J 

o 

KgKYs.M,^ 

o 

where  KflX,  Z^)  is  the  block  lower  triangular  Toeplitz  matrix  with  the  first  column 
block  X.  The  matrix  O  denotes  a  null  matrix  of  appropriate  size  such  that  K^(Xi,  F^) 
and  Kg(  T,,  F^)  are  m  X  Ti  3nd  ft  ft  mstriceSy  respectively. 

To  see  how  to  use  convolutions  for  the  product 

Kg{Xi,F^)Kl(Yi,F>’)b 

it  is  enough  to  consider  matrix-vector  multiplications  of  the  form  KgiX,  Z^)b.  Note 
that  Kg(  X,  Z'’)b  can  be  expressed  as  sum  of  j3  products  of  scalar  lower  triangular  Toeplitz 
matrix  and  vectors.  As  an  example, 


do  ^0 

bo 

a\  C| 

bi 

Cj  C2  Oo  Cq 

bi 

aj  cj  a,  c,  _ 

ao 

bo 

'  Co 

'bi' 

fll  Oo 

0 

t 

C]  Co 

0 

02  <t|  flo 

bz 

"r 

C2  Cl  Co 

bs 

_  O3  02  _ 

0 

C]  C2  Cl  Co  _ 

0 

The  multiplications  in  the  right  sides  of  (26)  can  be  done  by  fast  convolutions,  and 
therefore,  so  can  the  multiplication  ^b. 

5.  Concluding  remarks.  We  have  presented  0{a^n  log^  n)  algorithms  for  the  de¬ 
termination  of  exact  and  least  squares  solutions  of  linear  systems  with  matrices  having 
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(generalized )  displacement  rank  a.  Such  algorithms  for  exact  solutions  have  been  studied 
by  several  authors,  most  recently  by  Ammar  and  Gragg  [2]  for  Toeplitz  systems.  They 
also  made  a  very  close  study  of  the  implementation  of  the  convolution  operation  in  an 
attempt  to  obtain  the  smallest  coefficient.  Although  we  have  not  attempted  so  close  an 
operation  count  for  the  more  general  algorithm  in  our  paper,  the  hidden  constant  in  the 
operation  counts  for  solving  Toeplitz  least  squares  problems  is  quite  high  because  a  = 
4  for  the  matrices  £2  or  £3  (see  (3))  with  a  full  rectangular  T.  Also  we  conjecture  that 
our  algorithm  suffers  numerical  stability  problem  when  £|,i  in  ( 1 )  has  a  leading  principal 
submatrix  that  is  close  to  singular;  nevertheless  we  might  hope  that  numerical  refinements 
devised  for  the  Schur  algorithm  (see,  e.g.,  Koltracht  and  Lancaster  [18])  may  be  carried 
over  to  the  divide-and-conquer  framework  as  well. 

We  also  mention  that  the  fast  algorithms  for  Hankel  and  close-to-Hankel  ma¬ 
trices  in  [10]  can  be  implemented  with  divide-and-conquer  fashion  using  the  spirit  in 
this  paper. 

Appendix.  We  shall  summarize  the  explanation  in  §  3  using  a  Pascal-like  recursive 
procedure.  First,  note  that  the  polynomial  0p;,(z)  (and  ^p:,(z))  has  q  -  p  +  2  terms. 
The  first  column  of  0p:,,(z)  has  terms  ranging  from  degree  z  to  ',  and  the  other 
columns  have  terms  from  1  to  z^~^.  Hence,  by  shifting  the  first  column  by  one  position, 
we  can  store  0p;,(z)  and  ^p:,(z)  in  the  array  “Poly”  from  ptoq  slots  inclusive: 

Poly:  array  [l..a,  l..a,  0..MAX-1]  of  record 
6:  coefficients; 

\f/:  coefficients 

end; 

The  computation  of  Qp-qiz)  is  sequential,  i.e.,  once  we  compute  0p;,(r),  we  do  not  need 
to  keep  0p;r- 1  ( z),  and  therefore,  the  array  “Poly”  can  be  kept  as  a  single  global  variable. 

The  polynomial  vector  Xp:^{z)  has  q-  p  +  I  terms,  and  therefore,  can  be  stored  in 
an  array  type  GENERATORS: 

type 

GENERATORS  =  array  [l..a,  0..MAX-1]  of  record 
x:  coefficient; 
y:  coefficient 

end 

However,  Xp.j,(z)  cannot  be  kept  as  a  global  variable,  and  local  copies  should  be  maintained 
until  we  compute  Xr.g(z). 

Now  we  can  describe  the  recursive  generalized  Schur  algorithm  as  follows. 

Algorithm  (recursive  /c-step  generalized  Schur  algorithm ). 

Input:  Generator  of  £,  { Ao( z),  Yo{ w) } ;  displacement  operator  { ©f/,  0^* } ; 

Number  of  steps,  k. 

Output:  Generator  of  S,  { Xk{z),  y*(  w) } ; 
procedure  RecursiveSchur 
var 

G,  LowerG:  GENERATORS; 

begin 

Find(0,it-1,G); 

Apply(0,  k,  n,  G,  LowerG); 
return  (LowerG) 


end 
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The  procedure  Find  (p,  q,  G)  computes  Qfrj,{z),  and  w)  given  { Xp-^/iz),  Yp„(  w) } , 
and  the  procedure  Apply  (p,  r,  q,  G,  LowerG)  returns  LowerG  =  {Xr,Qiz),  yr;,(w)} 
given  G  =  {Xpg(z),  Yp,,(w)} 

procedure  Find(p,  q:  index;  G:  GENERATORS); 
var 

r:  index; 

G,  LowerG;  GENERATORS; 
begin 

ifp  =  q  then  begin 

Compute  0p:,(z)  and 
return 
end 

r  :=  appropriate  integer  close  to  r(p+^)/21; 

Find(p.  r-1,  G); 

Apply(p,  r,  q,  G,  LowerG); 

Find(r,  q,  LowerG); 

( *  Use  fast  convolution  for  polynomial  products  * ) 

0p:,(z)  :=  0p:r-|(z)©r,,(z); 

end 

procedure  Apply  (p,  r,  q:  index;  G:  GENERATORS;  var  LowerG:  GENERATORS); 
begin 

(*  Use  fast  convolution  for  polynomial  products  *) 

Xr:q(z)  :=  Xp,g{z)®Ff%r  -  i{z)\ 

Yr.g{w)—  yp;«(w)©p*^p.,_,(H'); 

LowerG  :=  {Xr,q{z), 

Free  the  storage  of  { Xp^(  z ) ,  Ypg(  w) } ; 
return  (LowerG); 
end 
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